Robustness of coalescent estimators to between-lineage mutation rate variation.
نویسنده
چکیده
Data from HIV and from human neoplastic cells can show substantial between-lineage mutation rate variation even within a single population. Such variation may affect estimators of population quantities such as Theta = 4N(e)mu. Using simulated DNA data, I measured the effect of rate variation on recovery of Theta by the summary-statistic estimator of Watterson (Watterson GA. 1975. On the number of segregating sites in genetical systems without recombination. Theor Popul Biol. 7:256-276) and the coalescent maximum likelihood algorithm LAMARC (Kuhner MK. 2006. LAMARC 2.0: maximum likelihood and Bayesian estimation of population parameters. Bioinformatics. Advance Access doi: 10.1093/bioinformatics/btk051). Watterson's estimator showed a downward bias, as expected, with high values of Theta. LAMARC's mean estimate was accurate for all tested values of Theta and rate variation except for a downward bias when rate variation was maximal (i.e., the slow rate was zero). LAMARC had consistently narrower confidence intervals (CIs) than Watterson's estimator. Both methods tended to reject the truth too often when rate variation was 8x or greater and independent among branches, as well as when variation was 4x or greater and correlated among branches. In the case of Watterson's estimate, this excess rejection was fully attributable to variation among genealogies in the amount of total branch length associated with the fast and slow rates. However, in the case of LAMARC, some excess rejection was still observed even when between-genealogy variation was taken into account. Both estimators are robust to modest rate variation; however, their use should be coupled with a statistical test to rule out extreme rate variation as the resulting CIs may not be reliable.
منابع مشابه
The number of alleles at a microsatellite defines the allele frequency spectrum and facilitates fast accurate estimation of theta.
Theoretical work focused on microsatellite variation has produced a number of important results, including the expected distribution of repeat sizes and the expected squared difference in repeat size between two randomly selected samples. However, closed-form expressions for the sampling distribution and frequency spectrum of microsatellite variation have not been identified. Here, we use coale...
متن کاملComparison of Y-chromosomal lineage dating using either evolutionary or genealogical Y-STR mutation rates
We have compared the Y chromosomal lineage dating between sequence data and commonly used Y-SNP plus Y-STR data. The coalescent times estimated using evolutionary Y-STR mutation rates correspond best with sequence-based dating when the lineages include the most ancient haplogroup A individuals. However, the times using slow mutated STR markers with genealogical rates fit well with sequence-base...
متن کاملCalibrating a molecular clock from phylogeographic data: moments and likelihood estimators.
We present moments and likelihood methods that estimate a DNA substitution rate from a group of closely related sister species pairs separated at an assumed time, and we test these methods with simulations. The methods also estimate ancestral population size and can test whether there is a significant difference among the ancestral population sizes of the sister species pairs. Estimates present...
متن کاملCoalescent theory for a partially selfing population.
A coalescent theory for a sample of DNA sequences from a partially selfing diploid population and an algorithm for simulating such samples are developed in this article. Approximate formulas are given for the expectation and the variance of the number of segregating sites in a sample of k sequences from n individuals. Several new estimators of the important parameters theta = 4N mu and the self...
متن کاملA coalescent-based estimator of admixture from DNA sequences.
A variety of estimators have been developed to use genetic marker information in inferring the admixture proportions (parental contributions) of a hybrid population. The majority of these estimators used allele frequency data, ignored molecular information that is available in markers such as microsatellites and DNA sequences, and assumed that mutations are absent since the admixture event. As ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Molecular biology and evolution
دوره 23 12 شماره
صفحات -
تاریخ انتشار 2006